Multi-Language Image Description with Neural Sequence Models

نویسندگان

  • Desmond Elliott
  • Stella Frank
  • Eva Hasler
چکیده

In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. To create a description of an image for a given target language, our sequence generation models condition on feature vectors from the image, the description from the source language, and/or a multimodal vector computed over the image and a description in the source language. In image description experiments on the IAPR-TC12 dataset of images aligned with English and German sentences, we find significant and substantial improvements in BLEU4 and Meteor scores for models trained over multiple languages, compared to a monolingual baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Volumetric soil moisture estimation using Sentinel 1 and 2 satellite images

Surface soil moisture is an important variable that plays a crucial role in the management of water and soil resources. Estimating this parameter is one of the important applications of remote sensing. One of the remote sensing techniques for precise estimation of this parameter is data-driven models. In this study, volumetric soil moisture content was estimated using data-driven models, suppor...

متن کامل

Multimodal Attention for Neural Machine Translation

The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simult...

متن کامل

A Multi-scale Multiple Instance Video Description Network

Generating natural language descriptions for in-thewild videos is a challenging task. Most state-of-the-art methods for solving this problem borrow existing deep convolutional neural network (CNN) architectures (Alexnet, Googlenet) to extract a visual representation of the input video. However, these deep CNN architectures are designed for single-label centered-positioned object classification....

متن کامل

Image Captioning with Object Detection and Localization

Automatically generating a natural language description of an image is a task close to the heart of image understanding. In this paper, we present a multi-model neural network method closely related to the human visual system that automatically learns to describe the content of images. Our model consists of two sub-models: an object detection and localization model, which extract the informatio...

متن کامل

Subhashini VenugopalanProposal

For most people, watching a brief video and describing what happened (in words) is an easy task. For machines, extracting the meaning from video pixels and generating a sentence description is a very complex problem. The goal of my research is to develop models that can automatically generate natural language (NL) descriptions for events in videos. As a first step, this proposal presents deep r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1510.04709  شماره 

صفحات  -

تاریخ انتشار 2015